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ABSTRACT 


Esophageal Squamous Cell Carcinoma (ESCC) is among the most common 
malignant cancers worldwide. In the past, extensive efforts have been made to 
characterize the involvement of protein-coding genes in ESCC tumorigenesis but few 
for long noncoding RNAs (IncRNAs). To investigate the transcriptome profile and 
functional relevance of IncRNAs, we performed an integrative analysis of a customized 
combined IncRNA-mRNA microarray and RNA-seq data on ESCCs and matched normal 
tissues. We identified numerous IncRNAs that were differentially expressed between 
the normal and tumor tissues, termed “ESCC-associated IncRNAs (ESCALs)”, of 
which, the majority displayed restricted expression pattern. Also, a subset of ESCALs 
appeared to be associated with ESCC patient survival. Gene set enrichment analysis 
(GSEA) further suggested that over half of the ESCALs were positively- or negatively- 
associated with metastasis. Among these, we identified a novel nuclear-retained 
IncRNA, named Epist, which is generally highly expressed in esophagus, and which 
is down-regulated during ESCC progression. Epist over-expression and knockdown 
studies further suggest that Epist inhibits the metastasis, acting as a tumor suppressor 
in ESCC. Collectively, our analysis of the ESCC transcriptome identified the potential 
tumor suppressing IncRNA Epist, and provided a foundation for future efforts to 
identify functional IncRNAs for cancerous therapeutic targeting. 


www.impactjournals.com/oncotarget 


INTRODUCTION 


Human esophageal cancer is one of the most deadly 
cancers, ranking as the sixth leading cause of cancer- 
related deaths worldwide [1]. In China, esophageal 
squamous cell carcinoma (ESCC) is the major subtype of 
esophageal cancer, accounting for over 90% of the cases 
[2]. Despite improvements in diagnostic techniques and 
therapeutic modalities, ESCC still remains a devastating 
malignancy, mainly due to the late diagnoses and the 
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aggressive features of the disease. As with other cancers, 
the pathogenesis of ESCC appears to result from the 
dysregulation of protein-coding and noncoding genes 
involved in a number of vital functions such as cell cycle 
control, cell differentiation, cell migration, invasion 
and other cancer-related pathways. Recent studies of 
ESCC cell lines and clinical samples have suggested that 
epigenetic silencing, e.g. DNA methylation, of key tumor 
suppressor genes is critical to the initial and progressive 
steps of ESCC [3, 4]. However, the detailed mechanisms 
leading to ESCC have not yet been fully elucidated. 
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Identification of aberrantly expressed coding genes as 
well as noncoding RNAs may be crucial steps for both the 
diagnosis of ESCC and subsequent personalized treatment. 
Recently, a number of protein-coding genes, such as 
PTK6 and Rab25, have been identified as having tumor 
suppressor activity through modulating different signaling 
pathways [5, 6]. High levels of DNA methylation at the 
promoter regions of these genes in ESCC cell lines are 
common observations [3]. Several microRNAs have 
also been linked to various aspects of ESCC, such as 
promotion of cell migration and invasion (microRNA-25) 
or reduction of the cell cycle (miR-29c), thus being 
potential biomarkers of this disease [7, 8]. 

Long noncoding RNAs (IncRNAs) are identified 
as non-protein-coding transcripts, which are longer than 
200 nt and are often spliced, 3’-polyadenylated and have 
a 7-methylguanosine cap at 5’-terminal. LncRNAs are 
believed to be implicated in diverse biological processes 
and disease-related pathways in both vertebrates and 
invertebrates, controlling the aspects of processes such as 
cellular proliferation, development, lineage commitment, 
immune response, pluripotency and differentiation [9-13]. 
The identification of IncRNAs has expanded our notions 
of the transcriptome complexity and gene regulatory 
network. Although many efforts have been made to 
annotate the IncRNAs encoded in the human genome, the 
functions of most IncRNAs are still uncertain. And only a 
few dozen of the IncRNAs have been well characterized 
to date. For example, HOTAIR [14] and ANRIL [15] 
facilitate gene repression through recruiting the Polycomb 
Repressive Complex 2 (PRC2) to change the epigenetic 
state of their target genes, the megamind and cyrano 
lincRNAs play essential roles in the organogenesis of 
zebrafish embryos [11], and the lincRNA, Fendrr is 
involved in the regulatory networks controlling the fate of 
lateral mesoderm derivatives [16]. 

In recent years, accumulating evidence have 
suggested that an increasing number of reports have 
shown that IncRNAs play key roles in cancer and diseases 
[17-20]. Studies have associated the aberrant expression 
of IncRNAs with the tumor progression or metastasis, 
for example, PCAT-/ and SChLAP/ in prostate cancer 
[17, 18], HOTAIR in breast [20] and colorectal cancer 
[21]. However, there are still no comprehensive reports 
focusing on the functional roles of IncRNAs in ESCC. 
Very recently, an expression profile study on a large cohort 
of ESCC patients revealed an association between IncRNA 
expression level alterations and the clinical outcome for 
ESCC patients [22]. Besides, other studies have indicated 
that IncRNA expression shows higher tissue-specificity 
than that of protein-coding gene [23], suggesting that 
IncRNAs may have advantages as biomarkers for 
ESCC. Thus identifying ESCC related IncRNAs and 
their functions might provide valuable insights into the 
pathogenesis of ESCC as well as potential biomarker 
candidates. 
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In the present study, we implemented the integrative 
analysis of the long noncoding transcriptome of ESCCs 
and patient-matched normal specimens using a customized 
microarray and public RNA-seq data, thereby identifying 
a number of transcripts differentially expressed in normal 
and tumor tissues. Furthermore, we systematically 
assessed the tissue specificity of the IncRNA expression 
and its association with patient survival. Also, the GSEA 
method was applied to analyze the involvement of the 
ESCC associated IncRNAs in three biological processes 
(cell cycle, metastasis and apoptosis). Finally, we 
experimentally validated the functional role (metastasis) 
for one of the ESCALs, termed Epist, which acts as a 
tumor suppressor in ESCC tumorigenicity. 


RESULTS 


The long noncoding transcriptome profile of 
ESCC 


Esophageal Squamous Cell Carcinoma (ESCC) is 
one of the most deadly forms of cancer across the world. 
In order to investigate the relationship between IncRNAs 
expression and ESCC, we reanalyzed the microarray data 
from pairs of ESCC and matched surrounding normal 
tissues of 119 patients [22] in combination with RNA-seq 
datasets [5, 6] (see Materials and Methods for details). We 
re-annotated the probes in the original microarray data 
(Figure S1A), and the result being that 11629 IncRNAs 
were interrogated by one or more unique probes (see Li et 
al. [22]; Materials and Methods; Table S1). 

Compared with protein-coding genes, the expression 
levels of IncRNAs were markedly lower in ESCC tissues 
(p-value < 2.2e-16; Figure 1A). The expression of most 
(~90%) of the annotated IncRNAs was also extremely 
low or not detected in the RNA-seq data (GSE29968). 
Specifically, 12.58% (1745) of the 13870 GENCODE 
(v19) annotated long noncoding RNAs were detected, and 
similarly, only 6.07% (498) of 8195 lincRNAs from Cabili 
et al. [23] were detected. By contrast, approximately 
70% of the RefSeq genes were robustly detected. These 
differences are in accordance with previous reports that 
the expression levels of IncRNAs are generally lower than 
those of protein-coding genes [23] (Figure 1B-1C). 

Recently, the Rab25 and PTK6 genes were identified 
as tumor suppressors in ESCC, and these two genes 
were markedly down-regulated in ESCC tumorigenesis 
both in previous [5, 6] and in this study (Figure 1B). 
Next, we called the aberrantly expressed protein-coding 
genes across this cohort of ESCC patients (Materials 
and Methods; Table S2), and Gene Ontology enrichment 
analysis of protein-coding genes with aberrant expression 
in ESCC and normal tissues indicated that a number of 
biological processes were enriched, such as cell cycle, 
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Figure 1: Transcriptome analysis of long noncoding RNAs in esophageal squamous cell carcinoma (ESCC). A 
Distribution of the expression levels of long noncoding RNAs (red) and protein-coding genes (blue) represented in the combined microarray. 
Dashed lines indicates the average value (6.799 for IncRNA and 9.963 for protein-coding genes). B., C. The density plot of the abundance 
of protein-coding genes B. and IncRNAs C. (log2-RPKM determined by Cufflinks) in the RNA-seq data. Several recently identified tumor 
suppressor genes (PTK6, Rab25) and IncRNAs that were differentially expressed between ESCC and adjacent normal tissues are indicated. 
D. The Circos plot showing genome-wide differential expression of long noncoding RNAs (930 probes) and mRNAs (3220 probes) 
between ESCCs and matched normal tissues. The outer and inner tracks indicate the IncRNAs and mRNAs, respectively (red, up-regulated; 
blue, down-regulated). The heights of the bars indicate fold differences. E. The barplot shows the biotype fraction of genome-scale, 
microarray assayed, differentially expressed IncRNAs (long intergenic noncoding RNA, Antisense, Sense_overlap and other biotypes). 
F. The volcano plot showing all the IncRNA probes assayed in the microarray and the differentially expressed IncRNAs (ESCALs) are 
indicated by the red dots. (The arrow suggests Epist). G., H. Quantitative RT-PCR analysis of several selected ESCALs in cell lines. G. 
Down-regulated ESCALs (XLOC_012538, NR_024386 and NR_033961); H. Up-regulated ESCALs (XLOC_004273, ENST00000478294 
and HOTAJR). Het-1A is an immortalized normal esophageal epithelial cell line; TE-1, EC109, KYSE510 and KYSE450 are ESCC cell 
lines. Error bars, S.E.M. 
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epithelial cell differentiation (Figure S2). Similarly, 
there were also numerous IncRNAs that were markedly 
up- or down-regulated during the ESCC progression 
(Figure 1C). To further investigate this, we integrated 
the above microarray and RNA-seq data to interrogate 
the differentially expressed IncRNAs between ESCC and 
adjacent normal esophageal tissues. 


LncRNAs are differentially expressed in ESCC 
relative to adjacent normal tissues 


To determine how many IncRNAs_ were 
differentially expressed and have the potential roles during 
the cancer-related processes, the expression detected at 
each IncRNA probe was screened by a stringent criterion 
(as described in Methods). Therefore, of the 13589 probes 
interrogating IncRNAs, 930 probes (representing 827 
IncRNAs) were considered significantly differentially 
expressed in the microarray analysis, including 341 and 
589 probes showing up-regulated and down-regulated 
expression, respectively (Figure 1D-1F). The differentially 
expressed IncRNAs corresponding to these probes were 
termed as “ESCC associated IncRNAs (ESCALs)”. 
And most of ESCALs belong to groups on IncRNAs 
previously annotated “lincRNAs” and “antisense RNAs” 
(Figure 1E). To further confirm the observed differences 
in expression, we randomly selected a subset of ESCALs 
and used qRT-PCR to determine their expression levels 
in an immortalized normal esophageal epithelial cell line 
(Het-1A) and in several ESCC cell lines (Figure 1G-1H). 

Besides, we also explored other two public RNA- 
seq data to profile the long noncoding transcriptome of 
ESCC and adjacent normal tissues (Figure S3A). Given 
the characteristics of the RNA-seq data, we adopted 
distinct methods for calling the differentially expressed 
genes (Document S1 and Figure S3). And the differentially 
expressed genes called by these two RNA-seq data were 
shown in Table S3. In addition, we also searched for novel 
transcripts and fusion transcripts, the methods and results 
were stated in Document S3. 


Aberrantly expressed IncRNAs (ESCALs) across 
human cancers 


Next, we asked whether the identified ESCALs 
have tissue specific expression patterns, or potential to 
be as biomarkers. To this end we conducted the analysis 
of comparing the ESCALs with the IncRNAs expression 
profile in diverse cancers. 

We interrogated the transcriptional profile among 
64 archived human cancers comprising 17 diagnostic 
subtypes of squamous cell carcinomas (head and neck, 
lung, except esophagus), adenocarcinoma as well as 
sarcomas [19]. Only 190 (~23%) out of the 827 ESCALs 
were expressed as assessed by 3SEQ (3’-End Sequencing 
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for Expression Quantification) in these tumor samples, and 
even a smaller fraction (112, ~13.5%) of the ESCALs were 
also observed as differentially expressed across the panel 
of assayed tumor samples (Table S4A-B). One particular 
case is the IncRNA, MALAT-/ [24], which is generally 
among the most highly expressed IncRNAs. MALATI 
shows no marked differences in expression between 
tumors and normal surrounding tissues, suggesting that 
this IncRNA is not associated with the tumorigenicity of 
ESCC. The same applies to some other well-characterized 
IncRNAs, such as ANRIL [15]. 

We also intersected the ESCALs with the 
LncRNADisease database [25], which has collected and 
curated hundreds of IncRNAs associated with diverse 
human diseases. However, only 19 long noncoding RNAs 
from the LncRNADisease were found among the ESCALs, 
these including several IncRNAs such as HOTAIR, PVT1, 
H19 and GASS that have been reported to be dysregulated 
in multiple human cancers. Among these, the expression 
of H19 and HOTAIR was validated in the Het-1A and 
ESCC cell lines by qRT-PCR. HOTAIR, binds physically 
to the chromatin modifying enzymes PRC2 and LSD1 to 
reprogram the chromatin state and further promote cancer 
metastasis in breast cancers [20] and colorectal cancers 
[21], also exhibits oncogenic activity in ESCC [26]. 
Interestingly, two long noncoding RNAs associated with 
barrett’s esophagus or squamous carcinoma, AFAPI-AS/ 
and UCA1, respectively, were also included. There may 
be some suggestions that these IncRNAs play the similar 
functional roles in ESCC tumorigenesis as previously 
reported (Table S4C). 

All the results indicated that the majority of ESCALs 
showed restricted expression patterns to esophagus 
epithelial cell or esophageal squamous cell carcinomas 
(ESCC) and there may be other uncharacterized IncRNAs 
important during the ESCC progression. 


LneRNAs expression and ESCC patient survival 


Our previous study identified a three-IncRNA 
signature, which might serve as a new biomarker for 
the prognosis of ESCC patients [22]. To further explore 
the relationship between the expression levels of certain 
IncRNAs and the survival time of ESCC patients, we 
performed a log-rank survival analysis for every coding 
and noncoding probe assayed in microarray (Materials 
and Methods; Table S5). A total of 1122 probes showed 
significant association with ESCC patient survival, 
including the mRNA, Rab25 [6] and lincRNA, HOTAIR 
[26] previously reported from other groups based on 
different cohorts of ESCC patients (Figure 2A). Among 
the 827 ESCALs, 41 were significantly (log rank test, 
p-value<0.05) associated with the survival of ESCC 
patients. Of which, 11 were up-regulated and 30 down- 
regulated in tumor tissues. Figure 2B-2D showed the 
Kaplan-Meier survival curves across the 119 ESCC 
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patients for the up-regulated HOTAIR and two down- 
regulated ESCALs (/inc-TMEM106A and LOC645638). 
Collectively, this suggests that the expression levels of 
IncRNAs can also be used for survival prediction of ESCC 
patients. 


Expression of IncRNAs is associated with 
metastasis, cell cycle and apoptosis 


It is well described that, deregulation of cell cycle, 
apoptosis and metastasis are critical in tumorigenesis. 
However, little is known about how and how many 
IncRNAs are involved in these biological processes. To try 
to address this question, we applied gene set enrichment 
analysis (GSEA) to evaluate the potential association 
of the ESCALs in cancerous biological processes (cell 
cycle, apoptosis, metastasis), using the relative difference 
(fold change) in IncRNA expression (AE = T - N) and 
correlation with that of protein-coding genes (Materials 
and Methods; Table S6). To assess the validity of the 
predicted functional associations, we examined the 
enriched gene sets for the ncRNA, HOTAIR. The GSEA 
result showed that HOTAIR positively correlates with the 
metastasis gene sets (Figure 3A, 3C), which is consistent 
with the known properties of promoting metastasis [20]. 
Consequently, we found that more than half of the probes 
representing ESCALs (475 out of 930 probes) were 
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functionally associated with metastasis, and one third 
associated with cell cycle processes, while only ~15% 
with apoptosis (Figure 3B-3C). Thus, in all ~87% of 
the ESCALs may be involved in one or more cancerous 
pathways, while ~13% of them could not be associated 
with any of these three biological processes, indicating 
they may be involved in other processes of tumorigenesis, 
such as tumor growth pathway. Taken together, these 
results indicate that the ESCALs may play critical roles 
during the ESCC tumorigenesis. 


Molecular characterization and expression 
pattern of Epist 


To explore the functional significance of IncRNAs 
in ESCC, we focused our attention on a number of 
ESCALs which were relatively highly expressed and 
for which GSEA suggested associations with a specific 
biological function. For these ESCALs, we calculated a 
tissue-specificity score and identified one transcript which 
was significantly (>=2-fold) down-regulated in the tumors 
in almost all patients (107/122, ~88%) (Figure 4A-4C and 
Figure 1F), and which was specifically highly expressed in 
normal esophagus epithelial cells indicated by the RNA- 
seq data (Figure 4G) and epigenetic landscape [27] (Figure 
S4). Then we termed this transcript, esophagus epithelial 
intergenic specific transcript, or Epist. Performance of 
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Figure 2: Long noncoding RNAs associated with survival time of ESCC patients. A. The workflow of screening the mRNA 
and IncRNA probes whose expression level associates the survival time of ESCC patients, as described in Methods. (B, C, D) These panels 
show the Kaplan-Meier survival curves for three lincRNA examples of them. HOTAIR, linc-TMEM106A and LOC645638. 
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5’- and 3’-RACE and Northern Blot determined Epist 
as an ~600 nt long transcript, originating from a locus of 
approximately | kb in chromosome 5. Additionally, 3SEQ- 
Seq analysis further showed that a poly-A signal occurred 
at ~4.5kb upstream of the neighbouring protein-coding 
gene, PITX1 [19]. Although the transcriptional orientation 
of Epist is the same as its nearby genes, the RACE results 
and the presences of a poly-A signal indicated that Epist is 
an independent transcription unit rather than the extension 
of 5’-UTR of PITX1. Also, we used the Cufflinks and 
Trinity software to assemble the gene structure of Epist. 
Both the computational and experimental results suggested 
that Epist is an approximately 600-nt transcript consisting 
of two exons (Figure 4D-4E). 

To characterize the functional relevance of Epist, 
we firstly addressed whether the transcript is indeed 
noncoding. The computational analysis of coding potential 
suggested that the Epist sequence has a very low coding 
potential (Document S2). Secondly, isolation of the 
nuclear and cytosolic fractions of TE-1 cell showed that 
the Epist transcript was mainly located in the nucleus 
(similar to that of U1, Figure 4F). 

Further, we examined the expressional relationship 
between Epist and PITX1. The expression of Epist was 
highly correlated (R=0.95) with that of PITX1 across the 
cohort of ESCC patients (Figure 6A), indicating that there 
is an underlying connection between Epist and PITX1. 
However, we observed that knock-down of Epist could 
not significantly alter the expression of PITX1, and vice 
versa (Figure 4H-41). These results showed that Epist does 
not directly regulate the transcription of PITX1 and vice 
versa, suggesting that Epist may exert its function in trans. 
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Epist inhibits migration and invasion 


According to the above GSEA result, genes 
that are expressionally correlated with Epist are also 
enriched for functions related to metastasis discovered 
by three independent studies (Figure 5A-5B; Materials 
and Methods; Figure S6A). Also, the previously down- 
regulated gene set of esophageal cancer (MSigDB) was 
significantly enriched in Epist-positively-associated genes 
[28] (Figure 5C). To study this further, we performed the 
transwell migration assays using the KYSE30 cell line, 
which has a higher expression of Epist than other cell lines 
(Figure 4C). The results showed that knock-down of Epist 
lead to an increased number of cells migrating through the 
membrane (Figure 5D). Next, we established a KYSE30 
cell line which stably over-expressed the Epist IncRNA 
(Materials and Methods; Figure S5). Overexpression of 
Epist in the KYSE30 cells markedly reduced the effect of 
the migration as assessed by the transwell migration assay 
(Figure 5G). Analysis of the expression of EMT marker 
genes E-cadherin, N-cadherin in mRNA by qRT-PCR and 
western blot also supported the above results (Figure SI, 
5J). Meanwhile, we also performed the transwell invasion 
assays in the loss-of-function and gain-of-function of Epist 
in KYSE30 cells (Materials and Methods). The invaded 
results were similar to those of transwell migration assays 
(Figure 5D-SG). 

Additionally, it is well-established that TGF-B could 
promote cancer metastasis through its efforts on the tumor 
microenvironment [29]. Accordingly, we observed that 
the expression level of Epist was also decreased when the 
cells was treated by TGF-B (GSE54797) [30] (Figure 5H). 

As a third line of investigation, the MTT assay was 
applied to measure the growth rate of KYSE30 cells over- 
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Figure 3: Long noncoding RNAs associated with metastasis, cell cycle and apoptosis. A. Venn diagram of ESCALs probes 
associated with three distinct metastatic gene sets (Jaeger’s melanoma study [54] and Rickman’s and Cromer’s head and neck squamous 
cell carcinoma study [55, 56]). B. Number of probes representing ESCALs, which were positively- and negatively- correlated with KEGG_ 
CELL_CYCLE and KEGG_APOPTOSIS gene collections. C. The heatmap showing the hierarchical clustering of 930 probes of ESCALs 
associated with metastasis, cell cycle and apoptosis gene sets. Red, white, blue indicate positive, no and negative association, respectively. 
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expressing Epist. The results showed that over-expressing 
cell had a lower growth rate than controls (Figure 5K), 
indicating that Epist inhibited ESCC cell proliferation. 


the previous genetic screening report, PITX1 suppresses 
tumorigenicity by down-regulating the RAS pathway 
(oncogenic signaling pathway) through RASALI [31]. 


Thus, we calculated the correlation between the expression 
levels of RASALI and Epist in the cohort of 119 ESCC 
patients. Consequently, we observed that Epist expression 
positively and significantly correlated with RASAL1 
(Figure 6A). Furthermore, sik NA-mediated knock-down 
of either PITX1 or Epist produced similar effects on the 
expression level of RASALI in TE-1 cell line (Figure 


Epist functions as a tumor suppressing IncRNA in 
ESCC 


Next, to address how Epist inhibits the ESCC 
tumorigenesis, we firstly examined the expression 
between Epist and PITX1 target genes. According to 
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Figure 5: Epist inhibits migration and invasion processes. A., B. Epist-correlated genes are enriched in the gene signatures 
of metastasis from Jaeger et al. and Rickman et al. C. GSEA result shows Epist-correlated genes enriched in the gene set of previously 
identified differentially expressed genes in esophagus cancer. D. Representative images of transwell migration (top) and invasion (bottom) 


assays in siRNA-mediated knock-down of Epist. E. The barplot showed the summary 


of the number of migrated (left) and invaded (right) 


cells per field. F. Representative images of transwell migration (top) and invasion (right) assays in KYSE30 cells stably overexpressing 


Epist or control vector sequence. G. The barplot showed the summary of the number 


of migrated (left) and invaded (rigth) cells. H. Epist 


was down-regulated in the TGF-B-treated cells (GSE54797). I., J. The mRNAs I. and protein levels J. of several EMT markers (E-cadherin, 


N-cadherin, Vimentin) after siRNA-mediated loss-of-function of Epist in KYSE30 


cell line. K. Comparison of the cell proliferation 
t §.D. from three 


measured by MTS assay for KYSE30 overexpressing Epist or control vector cell lines. All these data show the mean 4 
independent assays. Statistical significance was determined by Student’s ¢ test. *p < 0.05, **p < 0.01; ***p < 0.001. 
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6B). PITX1 has also been identified as a suppressor of cells. We therefore examined the expression of TERT 
telomerase reverse transcriptase (TERT) in tumorigenesis in KYSE30 cells after knock-down or over-expression 
[32]. Up-regulation of TERT is a key component of the of Epist and found they led to up-regulation and down- 
transformation processes in many malignant cancer regulation of TERT level, respectively (Figure 6C-6D). 
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Figure 6: Epist acts as a tumor suppressor during ESCC tumorigenesis. A. Scatter plot showing the expressional relationship 
among Epist, PITX1 and RASAL1. RASALI is one of the downstream targets of PITX1. The upper right squares show the Pearson 
correlation values between each other, and lower left squares show the scatter plot matrix and fitted expressional trend lines for the 
same comparisons. The black circle and the red triangle represent the expression score in tumor tissues and matched normal tissues, 
respectively. The diagonal shows the expression distribution across the 119 ESCC paired tissues. B., C. siRNA-mediated loss-of-function 
of Epist results in the expression alteration of PITX1-target genes, RASAL1 in TE-1 cell B. and TERT in KYSE30 cell C.. D. KYSE30 
stably overexpressing Epist cell line shows significantly reduced expression level of TERT than control. E. Heatmap of the differentially 
expressed genes revealed by microarray-based analysis between the KYSE30 cells overexpressing Epist and control RNAs, including 95 
and 99 down- and up-regulated genes, respectively. F. The volcano plot shows the expressional alterations of all annotated genes. The red 
and blue points represent the up- and down-regulated genes, respectively, and corresponding ovals enclosed several previously identified 
cancer-associated genes (Up-regulated: FAT3, CALR Down-regulated: VIM, GATA6, DDIT3, HERC2, UPP1, S100P, IFITM1, SHOX2, 
PTPRM, CAPN12). G. The selected genesets from the GSEA analysis. The y axis represents the normalized enrichment score and the 
relative circular area represents the significant level (-log10(Nominal p-value)). 
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Besides, we performed the microarray analysis 
to determine which genes’ expression was altered by 
overexpressing Epist through comparing the gene 
expression profile between KYSE30-overexpressing- 
Epist cells and controls, thus identified 95 and 99 down- 
regulated and up-regulated genes, respectively (Figure 
6E). Expression levels of several previously reported 
genes implicated in cancer development and progression 
were altered, such as DDIT3, GATA6, UPPI, FAT3 
(Figure 6F). These genes are involved in tumorigenesis 
through diverse mechanisms. In addition, the GSEA 
results also revealed plenty of biological processes were 
changed upon Epist overexpression (Figure 6G). 

Collectively, all these results indicated that Epist 
acts as a tumor suppressor in ESCC tumorigenesis. 


Search for novel lincRNA transcripts and gene 
fusions in ESCC 


In contrast to microarrays, RNA-seq can be 
employed to identify the novel transcripts across 
the genome. We used the single-end RNA-seq data 
(GSE29968) to identify novel lincRNA candidates 
according to the pipeline described in Materials and 
Methods (Figure S1B), and identified 31 novel lincRNA 
candidates (Table S7). The expression profile of these 
novel lincRNAs across the 3 ESCC paired tissues are 
given in Table S7. 

Also, RNA-seq data can be used to find the gene 
fusions and has already led to the discovery of several 
gene fusions in some cancers [33], such as the BCR-ABL/ 
fusion in K562 cell line. We employed the above RNA- 
seq data to detect expressed gene fusions. However, no 
recurrent gene fusions were identified in these data. The 
depth of the RNA-seq data we analyzed is comparable 
to that in a work identifying recurrent rearrangements of 
CIITA in Hodgkin lymphoma cell lines [34]. On the other 
hand, gene fusions were neither detected in an analysis on 
the Sézary syndrome [35]. Although we do not observe 
the gene fusions in ESCC, we still cannot rule out the 
possibilities that the fusion transcripts exist, such as the 
heterologous gene linking to an oncogene. 


DISCUSSION 


The human genome encodes a large number 
of IncRNAs that are dynamically expressed across 
different developmental stages, that show highly tissue-, 
differentiation- and lineage-specific expression patterns, 
and that are implicated in diverse biological processes 
[36]. To date, a considerable number of IncRNAs have 
been identified and demonstrated to play important roles 
in many human cancers [17, 19]. In addition, the progress 
of high-throughput technology in recent years, such as 
microarray and RNA-seq, has made it much easier to 
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profile the noncoding transcriptome than before. 


LncRNAs are extensively dysregulated in ESCC 
tumorigenesis 


The work presented here investigated the differences 
in the long noncoding transcriptome profile between the 
esophageal squamous cell carcinoma and adjacent normal 
tissue using both microarray and deep RNA-seq data. 
The study identified several hundred ESCC-associated 
IncRNAs (ESCALs), the majority of which had previously 
not been reported to implicate in tumor progression. 

Although a large number of IncRNAs have shown 
to be involved in diverse biological pathways, the 
molecular mechanisms by which most IncRNAs act are 
still not clearly understood. One of the emerging themes 
of IncRNA is controlling of gene expression. There are a 
number of IncRNAs reported to play gene regulation roles 
through recruiting chromatin modifying complexes [37], 
such as HOTAIR and the PRC2 complex [20] and HOTTIP 
and the WDR5/MLL complex [38]. Some IncRNAs are 
also function through guiding transcription factors to 
their downstream targets, such as Paupar with PAX6 
[39] and THRIL with hnRNP-L [40]. In ESCC cancer cell 
lines, Epist functions as a tumor suppressor via affecting 
the expression levels of a number of cancer-associated 
genes (Figure 6E, 6F). Although Epist apparently does 
not affect the expression of PITX1, we have shown that 
Epist may regulate the expression levels of at least two 
genes (RASALI, TERT), targets of PITX1. However, we 
observed the facts that Epist and PITX1 are coordinately 
activated, perhaps they are co-located and share the 
upstream regulatory elements, thus co-expressed (Figure 
6A). However, we did not detect physical interaction 
between Epist and PITX1 (data not shown). The detailed 
molecular mechanisms of Epist exerts are still needed to 
further investigate. 

There are several ESCALs showing fold difference 
greater than +/-20. Of which, CASC9 (Homo sapiens 
cancer susceptibility candidate 9, LINC00981) has been 
recently validated in other cohorts of ESCC patients 
and reported to play the oncogenic roles in ESCC [41]. 
LOC645638 is another example of ESCAL that was 
detected down-regulated markedly in all three datasets (~5 
fold). Moreover, this IncRNA was significantly associated 
with the ESCC patient survival (Figure 2D) and, like 
Epist, was also inferred to have biological relevance to 
metastasis and cell cycle processes (Table S6). A recent 
report has shown that LOC645638 (Inc-DC), which is 
exclusively expressed in human conventional dendritic 
cells, promotes the phosphorylation of STAT3 Y705 by 
preventing STAT3 binding to and dephosphorylation by 
SHP1 [42]. Future studies should be addressed whether 
this IncRNA has tumor suppressive activity, and if so, 
whether these mechanisms are important in this aspect. 
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Inferring the biological functions of IncRNAs with 
paired tissues data 


One of the major challenges concerning the study of 
IncRNAs is to determine their biological functions. Here 
we used the GSEA tool to predict the biological roles 
for the IncRNAs in ESCC paired tissues data through 
the “guilt-by-association” method [9]. For this purpose, 
the fold difference in expression (AE = T - N) in tissues 
(and not the expression value itself) was assigned to 
each probed IncRNA and protein-coding gene. Because 
the genomic landscape in ESCC was similar to that in 
head and neck squamous cell carcinomas (HNSCC), two 
metastatic gene signatures applied to HNSCC were used 
[43]. The correlation between IncRNAs and mRNAs was 
computed based on the fold difference values. The fold 
difference may reflect transcriptional responses in the 
gene regulatory network. Aberrantly expressed HOTAIR 
previously reported to promote metastasis, was taken as 
a positive control [20]. Our work here is the first report 
to establish the utility of paired tissues data to infer 
functional roles of IncRNA in cancerous pathways. 


Epist may play critical roles in development and 
differentiation 


Recent efforts have shown that IncRNAs have 
higher tissue-specificity and less conservation than those 
of protein-coding genes [36]. In accordance with this 
notion, Epist is highly expressed in the esophagus (and 
to some extent in the prostate and colon). Epist is also not 
appreciably conserved in vertebrates, although it appears 
to be conserved among primates, indicating of a more 
recent evolution (Figure S7). 

Besides, the GSEA results suggested that among 
the genes, which were positively correlated with Epist in 
the microarray data, there were significantly enriched for 
genes belonging to the epithelial differentiation module 
(Figure S6B). Also, the bivalent domain (a chromatin 
region marked by both the active mark, H3K4me3 and 
the repressive mark, H3K27me3) [44] occurs at the 
promoter region of Epist and PITX1 in the H1-hESC cell 
line (ENCODE ChIP-seq data) (Figure S6C). Bivalent 
domains were proposed to represent a state in which the 
developmental and differential genes in ES cells are silent 
while at the same time being kept poised for activation 
[44]. Previously, IncRNA Fendrr, which controls the cell 
fate of lateral mesoderm [16] and in lung development 
[13], also possesses a bivalent domain in ES cells, and 
available ChromHMM data also showed signs of the 
poised promoter state at the Epist locus in diverse cell 
types. PITX1 has been reported to plays critical roles 
in specifying hindlimb morphology [45], thus, taken 
together, we may further postulate that bivalent chromatin 
marks Epist having essential roles in development and 
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differentiation similar to those of PITX1. 

Further investigation should address the functional 
significance of other ESCALs in order to elucidate 
the detailed mechanisms by which they exert. The 
application of recently developed technologies to capture 
the chromatin binding profile of ncRNAs on a genome 
scale, such as chromatin isolation by RNA purification 
(ChIRP) [46] and RNA antisense purification (RAP) [47], 
will probably contribute to unveil the detailed molecular 
mechanisms of how IncRNAs control gene expression and 
chromatin state. 

This study on a Chinese patient group has yielded 
new insights into the pathogenesis of ESCC, providing 
evidence that long noncoding transcripts may be regulators 
of tumorigenesis. Further work on the detailed mechanism 
by which IncRNAs act will be important for understanding 
how these transcripts may affect disease or cancer states 
and could provide key insights into therapeutic targeting. 


MATERIALS AND METHODS 


Microarray and RNA-seq data analysis 


The transcriptome profiles for 119 pairs of ESCCs 
and patient-matched normal tissues, were measured by 
the combined mRNA + IncRNA V2.0 microarray on an 
Agilent platform. The microarray data had previously 
been deposited on the Gene Expression Omnibus, under 
accession number GSE53625. Data processing was similar 
to our previous work [22]. Briefly, for the noncoding 
RNA probes, we established a probe map for mapping 
the probe sequences uniquely to the long noncoding RNA 
collections (GENCODE v19, UCSC NR* transcript and 
Cabili M. N. et al. [23]). Median values were used for 
multiple probes corresponding to the same noncoding 
gene. Quantile normalization, and imputation for missing 
values using a KNN-based method were processed in 
R. For calling of differentially expressed genes, the fold 
differences (>=2), FDR (<0.001) and average expression 
were used (the probed transcripts were discarded, if both 
normal and tumor values were less than the average value 
across the samples. The average expression value is 6.799 
for IncRNA and 9.963 for protein-coding gene, shown 
as the dash lines in Figure 1A). For the Gene Ontology 
enrichment analysis, the DAVID webserver [48] was used. 

The methods calling of differentially expressed 
IncRNAs for RNA-seq data were documented in Document 
S1. We computed the FPKM value as the expression level. 
The detailed summary of the high-throughput data is 
listed in Figure S2. Genome coverage wiggle files were 
generated with BEDTools [49]. Additional RNA-seq data 
from human tissues, cell types and cancer cell types were 
retrieved from the Illumina Human Body Map Project 
[23] and the human ENCODE project [50]. To identify the 
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novel lincRNA transcripts, single-exon transcripts were 
eliminated from the Cufflinks-assembled transcriptome 
[23]. De novo transcriptome assembly was processed by 
the Trinity software [51]. 


ESCC cell lines and culture conditions 


The ESCC cell lines (KYSES510, KYSE450, 
KYSE150, KYSE30, EC109 and TE-1) were purchased 
from Cell Resources Center, IBMS, CAMS/PUMC. 
The immortalized normal esophageal epithelial cell line 
Het-1A was purchased from ATCC, not authenticated 
before use. The STR profile of KYSE30 cell line was 
authenticated at GENEWIZ, Beijing. The TE-1 and EC109 
cell lines were authenticated by Cell Resources Center, 
were maintained using standard media and conditions. 
Specifically, KYSE510, KYSE450, KYSE150, KYSE30, 
EC109 and TE-1 cells were maintained in RPMI 1640 
(Life Technology) supplemented with 10% FBS. Het-1A 
was maintained in BEBM (Lonza). The flasks used to 
culture Het-1A cells were pre-coated with a mixture of 
0.01 mg/mL fibronectin, 0.03 mg/mL bovine collagen 
type I and 0.01 mg/mL bovine serum albumin dissolved 
in culture medium. K YSE30 stably overexpressing Epist 
or control vector cell lines were generated by transfected 
lentiviral constructs encoding Epist or blank RNA for 48 
h. GFP-positive cells were selected on 2 ug/ml puromycin 
for one week. 


Transcriptome-wide analysis of KYSE30 cell line 
overexpressing Epist 


Total RNAs were isolated from the KYSE30 
cells overexpressing Epist or control vector using the 
Trizol (three biological replicates), followed by DNasel 
treatment. The quality of RNAs were assessed with 
Nanodrop 2000. 500 ng RNAs were used to generate 
fluorescence-labelled cDNA for hybridization according 
to the Agilent standard workflow. The microarray 
hybridization experiment was performed at the CapitalBio 
Corporation, Beijing. Arrays were scanned with the 
Agilent DNA Microarray Scanner and processed with the 
Agilent Feature Extraction v10.7, then quantile normalized 
using the R limma package. The fold change and two- 
tailed Student’s ¢ test were applied for statistical analysis 
of differentially expressed genes. Heatmap and Scatterplot 
were drawn with the functions in R. The microarray data 
have been deposited in the GEO database under accession 
number GSE64792. 
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Rapid amplification of cDNA ends (RACE), 
northern blot and quantitative RT-PCR (qRT- 
PCR) 


Total RNAs were isolated from the cell lines (Het- 
1A, EC109, KYSE510, KYSE450, TE-1, KYSE30, 
KYSE150) with the Trizol reagent (Invitrogen) according 
to the manufacturer’s instructions, and then subjected 
to DNasel treatment. 3’- and 5’-RACE were performed 
using the RLM RACE kit (Ambion AM1700) and 5’- 
RACE System for Rapid Amplification cDNA Ends (Life 
Technology, 18374-058). 

For the qRT-PCRs, the SuperScript III reverse 
transcriptase was employed to synthetize the first-strand 
cDNA. qPCR analysis was performed with 0.5 yL cDNA 
as template in a 20 uL reaction volumes using SYBR 
green master mixture on the Rotor-Gene® Q real-time 
cycler (Qiagen). The expression levels were normalized 
to GAPDH and the relative expression was calculated 
by the delta-delta Ct method. All analyses shown were 
carried out on at least three biological replicates for each 
sample; P-values were calculated by Student’s t-test from 
the biological replicates. 

The Northern blot was performed as described [52]. 
All primers used are listed in the Table S8. 


Nuclear localization 


The nuclear and cytoplasmic fractions of TE-1 cells 
were isolated with the NE-PER Nuclear and Cytoplasmic 
Extraction Reagents (Thermo). The total RNA isolated 
from the subcellular fractions was assessed by qRT-PCR. 
GAPDH and ACTB serve as the cytosolic controls, and 
U1 as the nuclear control. 


Survival analysis with probed genes for ESCC 


For each probe in the microarray assay, we 
performed in parallel survival analysis with the expression 
value (T) of tumor tissues and the difference in expression 
value between tumor and adjacent normal tissues (AE = 
T - N). We labelled the tumor expression or expression 
difference of each probe as “high” or “low” according to 
whether the value was higher or lower than the average 
of T or AE, respectively, across the 119 pairs of ESCC 
samples. The log-rank test was used to measure whether 
the survival time was significantly different between the 
“high” and “low” expression of patient groups, the probe 
was included for further consideration if p-values for both 
T and AE were less than 0.05. The Kaplan-Meier plots 
were drawn using the R packages. Clinical information 
concerning the patients is found in Li et al. [22]. 
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Gene set enrichment analysis 


For the GSEA [53], fold difference (AE = T - N) 
of every probe of ESCALs in the microarray across the 
119 paired samples was considered as a numeric vector, 
and all protein-coding genes were ranked by the Pearson 
metric with our python script. Genes whose correlation 
was measured by multiple probes were consolidated 
using the maximum values obtained with these probes. 
We employed the GseaPreRanked tool to identify the 
enriched gene-set collections in c2.all.v4.0.symbols. 
gmt of MSigDB with a permutation of 1000. The 
threshold for the nominal p-value and FDR q-value were 
set to 0.01 and 0.05, respectively. For calculating the 
correlation with metastasis, the three paired gene sets 
from the MSigDB Chemical and Genetic Perturbations 
(CGP) categories were taken to represent metastasis 
gene signature (JAEGER METASTASIS UP and DN 
[54], RICKMAN METASTASIS UP and DN [55], 
CROMER METASTASIS UP and _DN [56]). The 
KEGG_CELL_CYCLE and the KEGG_APOPTOSIS get 
sets were used to assess the associations with cell cycle 
and apoptosis, respectively. 


Lentivirus production and purification 


The Epist sequence was cloned into the lentivector 
pCDH-MSCV-MCS-EF1-GFP+Puro cDNA Cloning 
and Expression Vector (SBI CD713B-1). 293T cells 
were cultured in T-175 flasks at 40%~50% confluence 
before transfection. Transfection was performed using 
Lipofectamine 2000 (Life Technologies). For each flask, 
20.25 ug of lentivectors, 13.2 ug of pMDL-g/p, 5.1 ug 
of RSV-REV and 7.125 ug of VSVG were added to 4 ml 
OptiMEM (Life Technologies). 100 ul of Lipofectamine 
2000 was diluted in 4 ml OptiMEM and it was added to 
the plasmid mixture after 5 min. The complete mixture 
was incubated at room temperature for 20 min before 
being added to cells. After 6 h, the medium was changed 
to 30 ml DMEM + 20% FBS. The medium was collected 
at 48 h and 72 h and centrifuged at 3,000 rpm at 4 °C 
for 10 min to pellet the cell debris. The supernatant was 
filtered through a 0.45 um low protein binding membrane. 
The virus was ultracentrifuged at 20000 g for 2 h at 4°C 
and then resuspended overnight at 4°C in PBS. Aliquots 
were stored at -80°C. 


Transwell migration assay 


For knock-down of Epist, KYSE30 cells were 
transfected with siRNA, trypsinized after 24 h, and 
counted with a Coulter counter. For overexpression of 
Epist, KYSE30 cells stably expressing Epist or the control 
vector were trypsinized , and counted with a Coulter 
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counter. 35,000 cells were seeded in the upper chamber of 
Transwell (Corning 3422) with no serum medium, while 
10% FBS medium was added to the lower chamber as a 
chemoattractant. After 24 h incubation at 37°C, 5% CO, 
atmosphere, the non-migrating cells were gently removed 
with a cotton swab. Migrated cells located on the lower 
side of the chamber were stained with crystal violet, air 
dried and photographed. Migrated cells were eluted with 
10% acetic acid, and absorbance measured at 560 nm 
using spectrophotometer. 


Transwell invasion assay 


We diluted thawed BD Matrigel matrix (BD 
356230) into coating buffer (0.01M Tris pH 8.0, 0.7% 
NaCl) to a final concentration of 250 pg/ml, mix gently 
and thoroughly, followed by adding 0.1 ml of the diluted 
Matrigel matrix coating solution into the upper chamber 
of the Transwell (Corning 3422) and incubating the coated 
Transwell plate at 37°C for 2 h. Any pipets, syringes, or 
containers that will come in contact with BD Matrigel 
matrix must be chilled prior to use. Cultured 100,000 cells 
in the coated upper chamber, the rest operation is same as 
Transwell migration assay mentioned above. 

All the images of Transwell migration and invasion 
assays were captured using an Olympus microscope 
imaging systems. 
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